Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents

نویسندگان

  • Keiji Shinzato
  • Kentaro Torisawa
چکیده

This paper describes a method to acquire hyponyms for given hypernyms from HTML documents on the WWW. We assume that a heading (or explanation) of an itemization (or listing) in an HTML document is likely to contain a hypernym of the items in the itemization, and we try to acquire hyponymy relations based on this assumption. Our method is obtained by extending Shinzato’s method (Shinzato and Torisawa, 2004) where a common hypernym for expressions in itemizations in HTML documents is obtained by using statistical measures. By using Japanese HTML documents, we empirically show that our proposed method can obtain a significant number of hyponymy relations which would otherwise be missed by alternative methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Relational Adjectives for Extracting Hyponyms from Medical Texts

We expose a method for extracting hyponyms and hypernyms from analytical definitions, focusing on the relation observed between hypernyms and relational adjectives (e.g., cardiovascular disease). These adjectives introduce a set of specialized features according to a categorization proper to a particular knowledge domain. For detecting these sequences of hypernyms associated to relational adjec...

متن کامل

Acquisition of Hypernyms and Hyponyms from the WWW

Recently research in automatic ontology construction has become a hot topic, because of the vision that ontology will be the core component to realize the semantic web. This paper presents a method to automatically construct ontology by mining the web. We introduce an algorithm to automatically acquire hypernyms and hyponyms for any given lexical term using search engine and natural language pr...

متن کامل

Nine Features in a Random Forest to Learn Taxonomical Semantic Relations

ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech...

متن کامل

Using Lexical Patterns for Extracting Hyponyms from the Web

This paper describes a method for extracting hyponyms from free text. In particular it explores two main matters. On the one hand, the possibility of reaching favorable results using only lexical extraction patterns. On the other hand, the usefulness of measuring the instance’s confidences based on the pattern’s confidences, and vice versa. Experimental results are encouraging because they show...

متن کامل

A Phrase-based Ontology Enabled Semantic Processing System for Web Search

Semantic processing system (SPS) is a system that performs phrase search of web content. SPS takes a user query in natural language, converts it to a keyword query, expands the keyword query with synonyms, hypernyms, hyponyms, and meronyms, and presents the keyword query to a search engine. SPS then sifts through the search engine result pages extracting grammatical and semantic information fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004